Goto

Collaborating Authors

 free database


The future of AI relies on a high schoolteacher's free database

The Japan Times

In front of a suburban house on the outskirts of the northern Germany city of Hamburg, a single word -- "LAION" -- is scrawled in pencil across a mailbox. It's the only indication that the home belongs to the person behind a massive data gathering effort central to the artificial intelligence boom that has seized the world's attention. That person is high schoolteacher Christoph Schuhmann, and LAION, short for "Large-scale AI Open Network," is his passion project. When Schuhmann isn't teaching physics and computer science to German teens, he works with a small team of volunteers building the world's biggest free AI training data set, which has already been used in text-to-image generators such as Google's Imagen and Stable Diffusion. Databases like LAION are central to AI text-to-image generators, which rely on them for the enormous amounts of visual material used to deconstruct and create new images.


QUAM-AFM: A Free Database for Molecular Identification by Atomic Force Microscopy

#artificialintelligence

This paper introduces Quasar Science Resources–Autonomous University of Madrid atomic force microscopy image data set (QUAM-AFM), the largest data set of simulated atomic force microscopy (AFM) images generated from a selection of 685,513 molecules that span the most relevant bonding structures and chemical species in organic chemistry. QUAM-AFM contains, for each molecule, 24 3D image stacks, each consisting of constant-height images simulated for 10 tip–sample distances with a different combination of AFM operational parameters, resulting in a total of 165 million images with a resolution of 256 256 pixels. The 3D stacks are especially appropriate to tackle the goal of the chemical identification within AFM experiments by using deep learning techniques. The data provided for each molecule include, besides a set of AFM images, ball-and-stick depictions, IUPAC names, chemical formulas, atomic coordinates, and map of atom heights. In order to simplify the use of the collection as a source of information, we have developed a graphical user interface that allows the search for structures by CID number, IUPAC name, or chemical formula.